Add CHEDDAR GPU backend/emitter#2896
Conversation
j2kun
left a comment
There was a problem hiding this comment.
I will review the IR parts after office hours, but one major request below
| ) | ||
|
|
||
| # CHEDDAR GPU FHE library (opt-in, requires CUDA) | ||
| cheddar_extensions = use_extension("//bazel:extensions.bzl", "cheddar_deps") |
There was a problem hiding this comment.
So there's a lot of stuff in this PR around adding the cheddar backend (bazel overlay, opt-in config, etc.) but as far as I can tell nothing tests it because there are no end-to-end tests in this PR.
Can we either include an end-to-end test in this PR, or else have this PR only include the MLIR-only parts? (dialect, transforms, lit tests, emitter).
In particular, when I pull this branch to my local machine and I try running bazel build @cheddar, I get
$ bazel build @cheddar
ERROR: /usr/local/google/home/jkun/.cache/bazel/_bazel_jkun/886f3804b421b5442165b2f8eb57dad5/external/rules_cuda++toolchain+cuda/BUILD: no such target '@@rules_cuda++toolchain+cuda//:thrust': target 'thrust' not declared in package '' defined by /usr/local/google/home/jkun/.cache/bazel/_bazel_jkun/886f3804b421b5442165b2f8eb57dad5/external/rules_cuda++toolchain+cuda/BUILD
ERROR: /usr/local/google/home/jkun/.cache/bazel/_bazel_jkun/886f3804b421b5442165b2f8eb57dad5/external/+cheddar_deps+cheddar/BUILD.bazel:50:11: no such target '@@rules_cuda++toolchain+cuda//:thrust': target 'thrust' not declared in package '' defined by /usr/local/google/home/jkun/.cache/bazel/_bazel_jkun/886f3804b421b5442165b2f8eb57dad5/external/rules_cuda++toolchain+cuda/BUILD and referenced by '@@+cheddar_deps+cheddar//:cheddar'
ERROR: Analysis of target '@@+cheddar_deps+cheddar//:cheddar' failed; build aborted: Analysis failed
INFO: Elapsed time: 1.340s, Critical Path: 0.02s
INFO: 1 process: 1 internal.
ERROR: Build did NOT complete successfullyThere was a problem hiding this comment.
@missing tests: indeed, that's an oopsie on my side while extracting this out of the overall cheddar WIP branch. As we discussed in the office hour yesterday, I'll split this PR into three
- Dialect
- Emitter and FileCheck tests (I think I'll re-use this PR for this, to preserve the relevant review comments)
- Bazel dependency & end-to-end tests
@ local machine: Does your local machine have an Nvidia GPU with CUDA/etc installed? Otherwise, I'm pretty sure that this is an expected failure, due to @rules_cuda being unable to find the required CUDA related stuff? Maybe there's a way to set up the BUILD file to get a nicer/more helpful error message? Since you're directly asking bezel to build the cheddar target, enable_cheddar isn't "protecting" you here, I guess?
There was a problem hiding this comment.
From https://bazel-contrib.github.io/rules_cuda/0.3.0/
For hermetic toolchains, the rules handle toolchain configuration and library downloading automatically. See cuda.redist_json integration test for a comprehensible example.
For locally installed toolchains, _detect_local_cuda_toolkit and detect_clang determines how they are detected.
Either situation depends on cc toolchain availability, so you must also ensure the cc compiler is properly configured. On Windows, this means that you will also need to set the environment variable BAZEL_VC properly.
I would like to try this again on a machine that has an NVIDIA GPU as well as the hermetic LLVM toolchain. In theory, with the hermetic toolchain we should be able to build the targets even if the machine does not have a GPU or CUDA installed. But yes, we should split that into a standalone PR.
A quick look into this and it appears the Google org does not have any runners configured with a GPU. So this may be an obstacle and might force us to set up a Google-internal CI that runs on GCP and mirrors its results to GitHub. |
| }]; | ||
| let parameters = (ins | ||
| StringRefParameter<"path to JSON parameter file">:$path, | ||
| DefaultValuedParameter<"bool", "false", "use 64-bit word type">:$use64Bit |
There was a problem hiding this comment.
This field appears to be unused. (In fact, this entire attribute appears to be unused)
| let results = (outs Cheddar_Ciphertext:$output); | ||
| } | ||
|
|
||
| def Cheddar_MadUnsafeOp : Cheddar_Op<"mad_unsafe"> { |
There was a problem hiding this comment.
That's (for whatever reason) what Cheddar calls the function: https://github.com/scale-snu/cheddar-fhe/blob/main/include/core/Context.h#L377 and I figure making the exit dialects map closely to the library API is more important than making the naming consistent with other HEIR ops?
There was a problem hiding this comment.
Persisting what appears to be a typo...is just the right amount of whimsy. Maybe leave a comment for future confused people.
| // `GetRotKeyOp` stores a static distance attribute, but dynamic | ||
| // `ckks.rotate` lowering still needs a placeholder key op so the emitter can | ||
| // trace back to the `UserInterface`. This sentinel distance marks that case. | ||
| constexpr int64_t kDynamicRotationKeyDistanceSentinel = -1; |
There was a problem hiding this comment.
Is there a reason why you don't want to allow the GetRotKeyOp to take both dynamic and static options as well?
There was a problem hiding this comment.
Seeing this emitter makes me think we should invest some time into converting these emitters to use the emitc dialect, so that the vast majority of the codegen details can be avoided. 😰
There was a problem hiding this comment.
Looking at emitC, it's come a long way since I last used it in HECO, and the explicit variable handling looks really nice! I'm kind of tempted to try rewriting the Cheddar emitter into EmitC.... 😅
| while (currentLevel > targetLevelVal) { | ||
| current = cheddar::RescaleOp::create(rewriter, op.getLoc(), cheddarCtType, | ||
| ctx.value(), current); | ||
| --currentLevel; | ||
| auto encodedOne = cheddar::EncodeConstantOp::create( | ||
| rewriter, op.getLoc(), cheddar::ConstantType::get(getContext()), | ||
| encoder.value(), one, rewriter.getI64IntegerAttr(currentLevel), | ||
| rewriter.getI64IntegerAttr(*logDefaultScale)); | ||
| current = | ||
| cheddar::MultConstOp::create(rewriter, op.getLoc(), cheddarCtType, | ||
| ctx.value(), current, encodedOne); |
There was a problem hiding this comment.
A comment to explain why this needs a loop would be helpful.
| // multiplies), so it maps directly onto CHEDDAR's logScale field. | ||
| int64_t logScale = invEncoding.getScalingFactor(); | ||
|
|
||
| // lwe.rlwe_encode doesn't carry a plaintext level on main, so encode at |
There was a problem hiding this comment.
It has a RingAttr doesn't it? Isn't that support to encode the level via its RNS coefficient type? We might not instantiate those properly at the moment, but I know for sure the level analysis properly determines the needed levels of plaintexts involved in Pt-Ct ops, cf. encodeCleartextAsPlaintext. If there is a gap here please find/file the right GH issue and link it here with a TODO
| target.addLegalDialect<cheddar::CheddarDialect>(); | ||
| target.addIllegalDialect<ckks::CKKSDialect>(); | ||
| target | ||
| .addIllegalOp<lwe::RLWEEncryptOp, lwe::RLWEDecryptOp, lwe::RLWEEncodeOp, |
There was a problem hiding this comment.
Why not make the LWE dialect illegal wholesale? Cheddar has its own types, after all.
| return failure(); | ||
| } | ||
|
|
||
| // Scaling factors on CKKS encodings are stored log-additively on main (a |
There was a problem hiding this comment.
nit: comments talking about "on main" will become strange once they are merged into main.
Co-authored-by: Jeremy Kun <kun.jeremy@gmail.com>
Part of the work to make Cheddar a first-class backend, extracted to its own PR because the diff was getting out of hand :)
This adds the backend itself, i.e., the
cheddardialect,--scheme-to-cheddarpasses, emitter, etc and a few basic tests (including some very simple "e2e" ones). It also adds Cheddar itself as an optional (default) off Bazel dependency.Question: Do we want to run the e2e GPU tests on CI? If yes, what runner/etc should we be using?
This PR does not yet enable HEIR to target more complex programs to Cheddar, because Cheddar has very strict ideas about the acceptable scales of things, and with just this PR, HEIR's mgmt layer still has no idea about the Rational-Rescale-ish "25-30" system Cheddar uses.